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THE DICHOTOMY BETWEEN STRUCTURE AND 
RANDOMNESS, ARITHMETIC PROGRESSIONS, AND THE 

PRIMES 

TERENCE TAG 


Abstract. A famous theorem of Szemeredi asserts that all subsets of the 
integers with positive upper density will contain arbitrarily long arithmetic 
progressions. There are many different proofs of this deep theorem, but they 
are all based on a fundamental dichotomy between structure and random¬ 
ness, which in turn leads (roughly speaking) to a decomposition of any object 
into a structured (low-complexity) component and a random (discorrelated) 
component. Important examples of these types of decompositions include the 
Furstenberg structure theorem and the Szemeredi regularity lemma. One re¬ 
cent application of this dichotomy is the result of Green and Tao establishing 
that the prime numbers contain arbitrarily long arithmetic progressions (de¬ 
spite having density zero in the integers). The power of this dichotomy is 
evidenced by the fact that the Green-Tao theorem requires surprisingly little 
technology from analytic number theory, relying instead almost exclusively on 
manifestations of this dichotomy such as Szemeredi’s theorem. In this paper 
we survey various manifestations of this dichotomy in combinatorics, harmonic 
analysis, ergodic theory, and number theory. As we hope to emphasize here, 
the underlying themes in these arguments are remarkably similar even though 
the contexts are radically different. 


1. Introduction 

In 1975, Szemeredi m proved the following deep and enormously influential 
theorem; 

Theorem 1.1 (Szemeredi’s theorem). Let A be a subset of the integers Z of positive 
upper density, thus limsup^^^ ^ Here |A| denotes the cardinality 

of a set A, and [—A, A] denotes the integers between — A and A. Then for any 
k>Z, A contains infinitely many arithmetic progressions of length k. 

Several proofs of this theorem are now known. The original proof of Szemeredi 
m was combinatorial. A later proof of Furstenberg nn, iia used ergodic theory 
and has led to many extensions. A more quantitative proof of Gowers cn, m was 
based on Fourier analysis and arithmetic combinatorics (extending a much older 
argument of Roth m handling the fc = 3 case). A fourth proof by Gowers EH and 
Rddl, Nagle, Schacht, and Skokan gni, 07], 0HI, 0ni relied on the structural theory 
of hypergraphs. These proofs are superficially all very different (with each having 
their own strengths and weaknesses), but have a surprising number of features in 
common. The main difficulty in all of the proofs is that one a priori has no control 
on the behaviour of the set A other than a lower bound on its density; A could 
range from being a very random set, to a very structured set, to something in 
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between. In each of these cases, A will contain many arithmetic progressions - but 
the reason for having these progressions varies from case to case. Let us illustrate 
this by informally discussing some representative examples: 

• (Random sets) Let 0 < <5 < 1, and let A be a random subset of Z, which 
each integer n lying in A with an independent probability of 5. Then A 
almost surely has upper density 5, and it is easy to establish that A almost 
surely has infinitely many arithmetic progressions of length k, basically 
because each progression of length fc in Z has a probability of of also 
lying in A. A more refined version of this argument also applies when A is 
pseudorandom rather than random - thus we allow A to be deterministic, 
but require that a suitable number of correlations (e.g. pair correlations, or 
higher order correlations) of A are negligible. The argument also extends 
to sparse random sets, for instance one where ^(n € A) ~ 1/logn. 

• (Linearly structured sets) Consider a quasiperiodic set such as A := {n : 
{an} < 5}, where 0 < (5 < 1 is fixed, a is a real number (e.g. a = v^) and 
{x} denotes the fractional part of x. Such sets are “almost periodic” be¬ 
cause there is a strong correlation between the events n G A and n + L G A, 
thanks to the identity {a(n + L)} — {an} = {aL} mod 1. An easy appli¬ 
cation of the Dirichlet approximation theorem (to locate an approximate 
period L with {aL} small) shows that such sets still have infinitely many 
progressions of any given length k. Note that this argument works regard¬ 
less of whether a is rational or irrational. 

• (Quadratically structured sets) Consider a “quadratically quasiperiodic” set 
of the form A := {n : {an^} < d}. If a is irrational, then this set has upper 
density 6, thanks to Weyl’s theorem on equidistribution of polynomials. (If 
a is rational, one can still obtain some lower bound on the upper density.) 
It is not linearly structured (there is no asymptotic correlation between the 
events n G A and n + LGA as n—>oo for any fixed non-zero L), however 
it has quadratic structure in the sense that there is a strong correlation 
between the events n G A, n + L G A, n + 2L G A, thanks to the identity 

{an^} — 2{a(n + L)^} + {a{n + 2L)‘^} = 2{q!L^} mod 1. 

In particular A does not behave like a random set. Nevertheless, the qua¬ 
dratic structure still ensures that A contains infinitely many arithmetic pro¬ 
gressions of any length k, as one first locates a “quadratic period” L with 
{aL^} small, and then for suitable n G A one locates a much smaller “linear 
period” M with {aLMn} small. If this is done correctly, the progression 
n,n -I- LM,... ,n+ {k— 1)LM will be completely contained in A. The same 
arguments also extend to a more general class of quadratically structured 
sets, such as the “2-step nilperiodic” set A = {n : {[v^nj-v/Sn < 5}, where 
[xj is the greatest integer function. 

• (Random subsets of structured sets) Continuing the previous example A := 
{n : {an"^} < <5}, let A' be a random subset of A with each n G A lying in 
A' with an independent probability of S' for some 0 < <5' < 1. Then this set 
A' almost surely has a positive density of 66' if a is irrational. The set A' 
almost surely has inhnitely many progressions of length k, since A already 
starts with inhnitely many such progressions, and each such progression as 
a probability of (S')'' of also lying in A'. One can generalize this example 
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to random sets A where the events n G A are independent as n varies, and 
the probability P{n G A) is a “quadratically almost periodic” function of 
n such as P{n G A) = F{{an^}) for some nice (e.g. piecewise continuous) 
function F taking values between 0 and 1; the preceding example is the 
case where F{x) := (5'lx<5- It is also possible to adapt this argument 
to (possibly sparse) pseudorandom subsets of structured sets, though one 
needs to take some care in defining exactly what “pseudorandom” means 
here. 

• (Sets containing random subsets of structured sets) Let A” be any set which 
contains the set A! (or A) of the previous example. Since A' contains 
inhnitely many progressions of length k, it is trivial that A” does also. 

As the above examples should make clear, the reason for the truth of Szemeredi’s 
theorem is very different in the cases when A is random, and when A is structured. 
These two cases can then be combined to handle the case when A is (or contains) 
a large (pseudo-)random subset of a structured set. Each of the proofs of Sze¬ 
meredi’s theorem now hinge on a structure theorem which, very roughly speaking, 
asserts that every set of positive density is (or contains) a large pseudorandom 
subset of a structured set; each of the four proofs obtains a structure theorem of 
this sort in a different way (and in a very different language). These remarkable 
structural results - which include the Furstenberg structure theorem and the Sze- 
meredi regularity lemma as examples - are of independent interest (beyond their 
immediate applications to arithmetic progressions), and have led to many further 
developments and insights. For instance, in m a “weighted” structure theorem 
(which was in some sense a hybrid of the Furstenberg structure theorem and the Sze- 
meredi regularity lemma) was the primary new ingredient in proving that the primes 
P := {2,3, 5,7,...} contained arbitrarily long arithmetic progressions. While that 
latter claim is ostensibly a number-theoretical result, the method of proof in fact 
uses surprisingly little from number theory, being much closer in spirit to the proofs 
of Szemeredi’s theorem (and in fact Szemeredi’s theorem is a crucial ingredient in 
the proof). This can be seen from the fact that the argument in EZI in fact proves 
the following stronger result: 

Theorem 1.2 (Szemeredi’s theorem in the primes). m Let A be a subset of the 
primes P of positive relative upper density, thus limsup^^go [pn[Zjv’jV]| ^ 
for any k> Z, A contains infinitely many arithmetic progressions of length k. 

This result was first established in the fc = 3 case by Green m, the key step 
again being a (Fourier-analytic) structure theorem, this time for subsets of the 
primes. The arguments used to prove this theorem do not directly address the 
important question of whether the primes P (or any subset thereof) have any 
pseudorandomness properties (but see Section [3 below). However, the structure 
theorem does allow one to (essentially) describe any dense subset of the primes as 
a (sparse) pseudorandom subset of some unspecified dense set, which turns out to 
be sufficient (thanks to Szemeredi’s theorem) for the purpose of establishing the 
existence of arithmetic progressions. 

There are now several expositions of Theorem o see for instance Ea, m, 
[SHI: ESI, EH’ Rather than give another exposition of this result, we have chosen 
to take a broader view, surveying the collection of structural theorems which un¬ 
derlie the proof of such results as Theorem o and Theorem o These theorems 
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have remarkably varied contexts - measure theory, ergodic theory, graph theory, 
hypergraph theory, probability theory, information theory, and Fourier analysis - 
and can be either qualitative (infinitary) or quantitative (finitary) in nature. How¬ 
ever, their proofs tend to share a number of common features, and thus serve as a 
kind of “Rosetta stone” connecting these various fields. Firstly, for a given class of 
objects, one quantifies what it means for an object to be “(pseudo-)random” and an 
object to be “structured”. Then, one establishes a dichotomy between randomness 
and structure, which typically looks something like this: 

If an object is not (pseudo-jrandom, then it (or some non-trivial 
component of it) correlates with a structured object. 

One can then iterate this dichotomy repeatedly (e.g. via a stopping time ar¬ 
gument, or by Zorn’s lemma), to extract out all the correlations with structured 
objects, to obtain a weak structure theorem which typically looks as follows: 

If A is an arbitrary object, then A (or some non-trivial component 
of A) splits as the sum of a structured object, plus a pseudorandom 
error. 

In many circumstances, we need to improve this result to a strong structure 
theorem: 

If A is an arbitrary object, then A (or some non-trivial component 
of A) splits as the sum of a structured object, plus a small error, 
plus a very pseudorandom error. 

When one is working in an infinitary (qualitative) setting rather than a finitary 
(quantitative) one - which is for instance the case in the ergodic theory approach - 
one works instead with an asymptotic structure theorem: 

If A is an arbitrary object, then A (or some non-trivial component 
of A) splits as the sum of a “compact” object (the limit of structured 
objects), plus an infinitely pseudorandom error. 

The reason for the terminology “compact” to describe the limit of structured 
objects is in analogy to how a compact operator can be viewed as the limit of finite 
rank operators; see m for further discussion. 

In many applications, the small or pseudorandom errors in these structure the¬ 
orems are negligible, and one then reduces to the study of structured objects. One 
then exploits the structure of these objects to conclude the desired application. 

Our focus here is on the structure theorems related to Szemeredi’s theorem and 
related results such as Theorem lf .21 we will not have space to describe all the gener¬ 
alizations and refinements of these results here. However, these types of structural 
theorems appear in other contexts also, for instance the Komlos subsequence prin¬ 
ciple m in probability theory. The Lebesgue decomposition of a spectral measure 
into pure point, singular continuous, and absolutely continuous spectral components 
can also be viewed as a structure theorem of the above type. Also, the stopping 
time arguments which underlie the structural theorems here are also widely used in 
harmonic analysis, in particular obtaining fundamental decompositions such as the 
Calderon-Zygmund decomposition or the atomic decomposition of Hardy spaces 
(see e.g. [HU), as well as the tree selection arguments used in multilinear har¬ 
monic analysis (see e.g. m)- It may be worth investigating whether there are any 
concrete connections between these disparate structural theorems. 


STRUCTURE, RANDOMNESS, AND PROGRESSIONS IN PRIMES 


5 


2. Ergodic theory 

We now illustrate the above general strategy in a number of contexts, beginning 
with the ergodic theory approach to Szemeredi’s theorem, where the dichotomy 
between structure and randomness is particularly clean and explicit, and one can 
work with an asymptotic structure theorem rather than a weak or strong one. Very 
informally speaking, the ergodic theory approach seeks to understand the set A of 
integers by analyzing the asymptotic correlations of the shifts A + n := {a + n : a € 
A} (or of various asymptotic averages of these shifts), and treating these shifts as 
occurring on an abstract measure space. More formally, let X be a measure space 
with probability measure d/i, and let T : X ^ X be a bijection such that T and T~^ 
are both measure-preserving maps. The associated shift operator T ■ f f o T~^ 
is thus a unitary operator on the Hilbert space L^(X) of complex-valued square- 
integrable functions with the usual inner product (/, g) := Jx fV dfJL. A famous 
transference result known as the Furstenberg correspondence principle^ (see CH, 
na, m) shows that Szemeredi’s theorem is then equivalent to 

Theorem 2.1 (Furstenberg recurrence theorem). Let X and T be as above, 
and let f G L°°(X) be any bounded non-negative function with f dp, > 0. Then 
for any k> 1 we have 

liminfEi<„<Ar [ fT^f .. f dp>0. 

N^oo Jx 

Here and in the sequel we use E„g/a„ as a shorthand for the average pj 

When k = 2 this is essentially the Poincare recurrence theorem; by using the 
von Neumann ergodic theorem one can also show that the limit exists (thus the 
lim inf can be replaced with a lim). The fc = 3 case can be proved by the following 
argument, as observed in m We need to show that 

(1) liminfEi<„<Ar / d/r>0 

N^oo Jx 

whenever / is bounded, non-negative, and has positive integral. 

The first key observation is that any sufficiently pseudorandom component of / 
will give a negligible contribution to o and can be dropped. More precisely, let 
us call / is linearly pseudorandom (or weakly mixing) with respect to the shift T if 
we have 

(2) lim Ei<„<^|(r"/,/)P = 0. 

N^oo 

Such functions are negligible for the purpose of computing averages such as those 
in O; indeed, if at least one of f,g,h G L°°{X) is linearly pseudorandom, then 
an easy application of van der Corput’s lemma (which in turn is an application of 
Cauchy-Schwarz) shows that 

lim Ei<„<jv f fT^gT^^h dp = 0. 

N^OO Jx 

^Morally speaking, to deduce Szemeredi’s theorem from Furstenberg’s theorem, one takes X 
to be the integers Z, T to be the standard shift n i—> n + 1, and {i to be the density fi(A) = 
limiv—)-oo ^ ■ This does not quite work because not all sets A have a well-defined density, 

however additional arguments (e.g. using the Hahn-Banach theorem) can fix this problem. 
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We shall refer to these types of results - that pseudorandom functions are negligible 
when averaged against other functions - as generalized von Neumann theorems. 

In view of this generalized von Neumann theorem, one is now tempted to “quo¬ 
tient out” all the pseudorandom functions and work with a reduced class of “struc¬ 
tured” functions. In this particular case, it turns out that the correct notion of 
structure is that of a linearly almost periodic function, which are in turn generated 
by the linear eigenfunctions of T. To make this more precise, we need the following 
dichotomy: 

Lemma 2.2 (Dichotomy between randomness and structure). Suppose that f € 
L°°(X) is not linearly pseudorandom. Then there exists an linear eigenfunction 
g € L°°(X) of T (thus Tg = Xg for some X G C) such that {f,g) 7 ^ 0. 

Remark 2.3. Observe that if 5 is a linear eigenfunction of T with Tg = Xg, then 
|A| = 1 and limAr^oo Ei<ji<Ar /jjf dp. = \g\'^. Thus linear eigenfunc¬ 

tions can and do give nontrivial contributions to the expression in O- One can 
view Lemma E21 as a converse to this observation. 

Proof. (Sketch) Let S denote the operator Sg := limAr^oo Ei<„<Ar( 5 , r”/)T"/ 
(this limit exists by the von Neumann ergodic theorem). One can show that S is 
self-adjoint, compact, and commutes with T, and thus by spectral theory has an 
expansion of the form Sg = J2k'^k{g,gk)gk where gk are a countable sequence of 
eigenfunctions of T and Ck are scalars. Since / is not linearly pseudorandom, we 
have {Sf,f) > 0, so in particular Sf is non-zero. This implies that {f,gk) 0 
for one of the eigenfunctions gk, and we are done. (The eigenfunctions must be 
bounded since S maps L'^{X) to L°°{X).) □ 

This lemma has the following consequence. Let Zi be the cr-algebra generated by 
all the eigenfunctions of T, this is known as the Kronecker factor of X, and roughly 
speaking encapsulates all the “linear structure” in the measure preserving system. 
Given every function / G L"^{X), we have the decomposition / = fu± + fu, where 
fiji. := E(/|Zi) is the conditional expectation of / with respect to the tr-algebra 
Zi (i.e. the orthogonal projection from Lf{X) to the Zi-measurable functions). 
By construction, fu := f — E(/|Zi) is orthogonal to every eigenfunction of T, and 
is hence linearly pseudorandom by Lemma, 12.21 In particular, we have established 

Proposition 2.4 (Asymptotic structure theorem). Let f be bounded and non¬ 
negative, with positive integral. Then we can spliff f = fur -b fu, where fur is 
bounded, non-negative, and Zi-measurable (and thus approximable in Lf to arbi¬ 
trary accuracy by finite linear combinations of linear eigenfunctions), with positive 
integral, and fu is linearly pseudorandom. 

This result is closely related to the Koopman-von Neumann theorem in ergodic 
theory. In the language of the introduction, it asserts (very roughly speaking) that 
any set A of integers can be viewed as a (linearly) pseudorandom set where the 
“probability” fur (n) that a given element n lies in A is a (linearly) almost periodic 
function of n. 

Note that the linearly pseudorandom component fu of / gives no contribution to 
o, thanks to the generalized von Neumann theorem. Thus we may freely replace 

^The notation is from El; the subscript U stands for “Gowers uniform” (pseudorandom), and 
f/-*- for “Gowers anti-uniform” (structured). 
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/ by fu^ if desired; in other words, for the purposes of proving o we may assume 
without loss of generality that / is measurable with respect to the Kronecker factor 
Zi. In the notation of we have just shown that the Kronecker factor is a 
characteristic factor for the recurrence in . (In fact it is essentially the universal 
factor for this recurrence, see EH, m for further discussion.) 

We have reduced the proof of o to the case when / is structured, in the sense 
of being measurable in Z 2 . There are two ways to obtain the desired “structured 
recurrence” result. Firstly there is a “soft” approach, in which one observes that 
every -measurable square-integrable function / is almost periodic, in the sense 
that for any £ > 0 there exists a set of integers n of positive density such that T”/ 
is within £ of / in L^(X); from this it is easy to show that jj'nj:rp 2 nj: jg close 

to p for a set of integers n of positive density, which implies This almost 
periodicity can be verified by first checking it for polynomial combinations of linear 
eigenfunctions, and then extending by density arguments. There is also a “hard” 
approach, in which one obtains algebraic and topological control on the Kronecker 
factor Zi. In fact, from a spectral analysis of T one can show that Zi is the inverse 
limit of a sequence of cr-algebras, on each of which the shift T is isomorphic to a 
shift X X + a on a compact abelian Lie group G. This gives a very concrete 
description of the functions / which are measurable in the Kronecker factor, and 
one can establish o by a direct argument similar to that used in in the introduction 
for linearly structured sets. This “hard” approach gives a bit more information; for 
instance, it can be used to show that the limit in m actually converges, so one can 
replace the lim inf by a lim. 

It turns out that these arguments extend (with some non-trivial effort) to the 
case of higher k. For sake of exposition let us just discuss the fc = 4 case, though 
most of the assertions here extend to higher k. We wish to prove that 

(3) liminfEi<„<^ / > 0 

whenever / is bounded, non-negative, and has positive integral. Here, it turns out 
that we must strengthen the notion of pseudorandomness (and hence generalize 
the notion of structure); linear pseudorandomness is no longer sufficient to imply 
negligibility. For instance, let / be a quadratic eigenfunction, in the sense that 
Tf = A/, where A is no longer constant but is itself a linear eigenfunction, thus 
TX = c\ for some constant c. As an example, if AT = (R/Z)^ with the skew shift 
T{x,y) = {x + a,y + x) for some fixed number a, then the function f{x,y) = 
g27riy ^ quadratic eigenfunction but not a linear one. Typically such quadratic 
eigenfunctions will be linearly pseudorandom, but if |A| = |c| = 1 (which is often 
the case) then we have the identity 

(4) ^i<n<N f dfi= f |/|8 dy 

Jx Jx 

and so we see that these functions can give non-trivial contributions to expressions 
such as (Hj. The correct notion of pseudorandomness is now quadratic pseudoran¬ 
domness, by which we mean that 

lim lim Ei<„<^Ei<„<^^|(rV7,r"(rV7))P = 0. 

H^oo N^oo 

In other words, / is quadratically pseudorandom if and only if T^ff is asymptoti¬ 
cally linearly pseudorandom on the average as h 00 . Several applications of van 
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der Corput’s lemma give a generalized von Neumann theorem, asserting that 

hm Ei<„<w [ /oT"/ir2"/2T3"/3 dfi = 0 

whenever /oj/i) 725/3 are bounded functions with at least one function quadrati- 
cally pseudorandom. 

One would now like to construct a factor Z 2 (presumably larger than the Kro- 
necker factor Zi) which will play the role of the Kronecker factor for the average 
in particular, we would like a statement of the form 

Lemma 2.5 (Dichotomy between randomness and structure). Suppose that f € 
L°°{X) is not linearly pseudorandom. Then there exists a Z 2 -measurable function 
g S L°°[X) such that {f,g) ^ 0. 

which would imply^ 

Proposition 2.6 (Asymptotic structure theorem). Let f be bounded and non¬ 
negative, with positive integral. Then we can split f = fu± + fjj, where fu± is 
bounded, non-negative, and Z 2 -measurable, with positive integral,and fu is quadrat- 
ically pseudorandom. 

This reduces the proof of ® to that of Z 2 -measurable /. The existence of such 
a factor Z 2 (which would be a characteristic factor for this average) is trivial to 
construct, as we could just take Z 2 to be the entire cr-algebra, and it is in fact easy 
(via Zorn’s lemma) to show the existence of a “best” such factor, which embed into 
all other characteristic factors for this average (see m)- Of course, for the concept 
of characteristic factor to be useful we would like Z 2 to be as small as possible, and 
furthermore to have some concrete structural description of the factor. An obvious 
guess for Z 2 would be the cr-algebra generated by all the linear and quadratic 
eigenfunctions, but this factor turns out to be a bit too small (see m, this is 
related to the example of the 2-step nilperiodic set in the introduction). A more 
effective candidate for Z 2 , analogous to the “soft” description of the Kronecker 
factor, is the space of all “quadratically almost periodic functions”. This concept 
is a bit tricky to define rigorously (see e.g. m, m, M), but roughly speaking, a 
function / is linearly almost periodic if the orbit {T'^f : n G Z} is precompact in 
L‘^{X) viewed as a Hilbert space, while a function / is quadratically almost periodic 
if the orbit is precompact in LS‘{X) viewed as a Hilbert module over the Kronecker 
factor L°°(Zi); this can be viewed as a matrix-valued (or more precisely compact 
operator-valued) extension of the concept of a quadratic eigenfunction. Another 
rough definition is as follows: a function / is linearly almost periodic if T”/(a;) is 
close to f{x) for many constants n, whereas a function / is quadratically almost 
periodic if f{x) is close to f{x) for a function n{x) which is itself linearly 
almost periodic. It turns out that with this “soft” proposal for Z 2 , it is easy to 
prove Lemma 12.51 and hence Proposition 12.61 essentially by obtaining a “relative” 

^One can generalize this structure theorem to obtain similar characteristic factors .B 3 , Z 4 , for 
cubic pseudorandomness, quartic pseudorandomness, etc. Applying Zorn’s lemma, one eventually 
obtains the Furstenberg structure theorem,, which decomposes any measure preserving system as a 
weakly mixing extension of a distal system, and thus decomposes any function as a distal function 
plus an “infinitely pseudorandom” error; see ESI. However this decomposition is not the most 
“efficient” way to prove Szemeredi’s theorem, as the notion of pseudorandomness is too strong, 
and hence the notion of structure too general. It does illustrate however that one does have 
considerable flexibility in where to draw the line between randomness and structure. 
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version of the proof of Lemma o The derivation of m in this soft factor is 
slightly tricky though, requiring either van der Waerden’s theorem, or the color 
focusing argument used to prove van der Waerden’s theorem; see m, [H: m, 
M More recently, a more efficient “hard” factor Z 2 was constructed by Conze- 
Lesigne [7|, Furstenberg-Weiss na, and Host-Kra EH; the analogous factors for 
higher k are more difficult to construct, but this was achieved by Host-Kra in 
and also subsequently by Ziegler m- This factor yields more precise information, 
including convergence of the limit in Here, the concept of a 2-step nilsystem 
is used to define structure. A 2-step nilsystem is a compact symmetric space G/F, 
with G a 2-step nilpotent Lie group and F is a closed subgroup, together with 
a shift element a G G, which generates a shift T(xr) := axT. The factor Z 2 
constructed in these papers is then the inverse limit of a sequence of cr-algebras, 
on which the shift is equivalent to a 2-step nilsystem. This should be compared 
with the “hard” description of the Kronecker factor, which is the 1-step analogue 
of the above result. Establishing the bound © then reduces to the problem of 
understanding the structure of arithmetic progressions xF, axT^ a^xF, a^xT on 
the nilsystem, which can be handled by algebraic arguments, for instance using the 
machinery of Hall-Petresco sequences mi 

The ergodic methods, while non-elementary and non-quantitative (though see 
mi), have proven to be the most powerful and flexible approach to Szemeredi’s 
theorem, leading to many generalizations and refinements. However, it seems that 
a purely “soft” ergodic approach is not quite capable by itself of extending to the 
primes as in Theorem o though it comes tantalizingly close. In particular, one 
can use Theorem o and a variant of the Furstenberg correspondence principle to 
establish Theorem o when the set of primes P is replaced by a random subset P 
of the positive integers, with n G P with independent probability 1/ logn for n > 1; 
see m- Roughly speaking, if A is a subset of P, the idea is to construct an abstract 
measure-preserving system generated by a set A, in which A n ... n T'^^A) is 

the normalized density of (A-|-ni)n.. .n (A-l-ns,) for any ni,..., n^. Unfortunately, 
this approach requires the ambient space P to be extremely pseudorandom and does 
not seem to extend easily to the primes. 


3. Fourier analysis 

We now turn to a more quantitative approach to Szemeredi’s theorem, based 
primarily on Fourier analysis and arithmetic combinatorics. Here, one analyzes 
a set of integers A finitarily, truncating to a finite setting such as the discrete 
integral {1,..., N} or the cyclic group Z/NZ, and then testing the correlations of 
A with linear phases such as n 1 -^ ^ 2 -Kikn/N ^ quadratic phases n 1 —> or 

similar objects. This approach has lead to the best known bounds on Szemeredi’s 
theorem, though it has not yet been able to handle many of the generalizations 
of this theorem that can be treated by ergodic or graph-theoretic methods. In 
analogy with the ergodic arguments, the fc = 3 case of Szemeredi’s theorem can be 
handled by linear Fourier analysis (as was done by Roth EDI), while the fc = 4 case 
requires quadratic Fourier analysis (as was done by Gowers CDI), and so forth for 
higher order k (see EDI)- The Fourier analytic approach seems to be closely related 
to the theory of the “hard” characteristic factors discovered in the ergodic theory 
arguments, although the precise nature of this relationship is still being understood. 
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It is convenient to work in a cyclic group ZijNTi of prime order. It can be shown 
via averaging arguments (see m) that Szemeredi’s theorem is equivalent to the 
following quantitative version: 

Theorem 3.1 (Szemeredi’s theorem, quantitative version). Let N > 1 be a large 
prime, let k > 3, and let 0 < 6 < 1. Let / : Zi/NZ —> R be a function with 
0 < f{x) < 1 for all X € Z/NZ and ^x£Z/Nzf(x) > S. Then we have 

Ex,reZ/Nzf{x)T^f{x) ■ ■ ■ T^'^-^>f{x) > C{k, S) 

for some c{k, i5) > 0 depending only on k and 6, where T''f(x) := f{x + r) is the 
shift operator on Z/NZ. 

We remark that the Fourier-analytic arguments in Gowers m give the best 
known lower bounds on c{k,S), namely c{k,5) > 2“^ ' where Cfe := 2^ In 
the fc = 3 case it is known that c(3, S) > for some absolute constant C, see 

[H]. A conjecture of Erdos and Turan (Hj is roughly equivalent to asserting that 
c{k,S) > for some Cfc. In the converse direction, an example of Behrend 

shows that c{3,S) cannot exceed 6'”'°® for some small absolute constant c, 
with similar results for higher values of k; in particular, c{k, 6) cannot be as large 
as any fixed power of S. This already rules out a number of elementary approaches 
to Szemeredi’s theorem and suggests that any proof must involve some sort of 
iterative argument. 

Let us first describe (in more “modern” language) Roth’s original proof [2111 of 
Szemeredi’s theorem in the fc = 3 case. We need to establish a bound of the form 

(5) Ex,rez/Nzf(x)T^f(x)T^^f(x) > c(3, 5) > 0 

when / takes values between 0 and 1 and has mean at least S. As in the ergodic 
argument, we first look for a notion of pseudorandomness which will ensure that 
the average in is negligible. It is convenient to introduce the Gowers U‘^(Z/NZ) 
uniformity norm by the formula 

\\f\\u^{Z/NZ) ■— En^z/Nz\Ex^Z/NzT"'f{x)f{x)\‘^, 

and informally refer to / as linearly pseudorandom (or linearly Gowers-uniform) if 
its norm is small; compare this with The norm is indeed a norm; this 
can be verified either by several applications of the Cauchy-Schwarz inequality, or 
via the Fourier identity 

(6) 11/11^2(2/^2;) = ^ l/(0l^i 

JGZ/TVZ 

where f{f) Exez/Nzfix)e~‘^^^^^^^ is the usual Fourier transform. Some further 
applications of Cauchy-Schwarz (or Plancherel’s theorem and Holder’s inequality) 
yields the generalized von Neumann theorem 

(V lEx,rez/Nzfo(x)T’'fi(x)T^’'f 2 (x)l< min ||//||( 72 (Z/WZ) 

J —G,i 

whenever /o, /i, /2 are bounded in magnitude by 1. Thus, as before, linearly pseu¬ 
dorandom functions give a small contribution to the average in 0, though now 
that we are in a finitary setting the contribution does not vanish completely. 

The next step is to establish a dichotomy between linear pseudorandomness and 
some sort of usable structure. From m and Plancherel’s theorem we easily obtain 
the following analogue of Lemma 12.21 
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Lemma 3.2 (Dichotomy between randomness and structure). Suppose that f : 
Z/NZ C is bounded in magnitude by 1 with \\f\\u^{z/NZ} > V for some 0 < 
r] < 1. Then there exists a linear phase function (j) : Z/NZ R/Z (thus (p{x) = 
(,x/N+c for some ^ G Z/NZ and c G R/Z/ such that 

The next step is to iterate this lemma to obtain a suitable structure theo¬ 
rem. There are two slightly different ways to do this. Firstly there is the orig¬ 
inal density increment argument approach of Roth EH, which we sketch as fol¬ 
lows. It is convenient to work on a discrete interval which we identify 

with a subset of Z/NZ in the obvious manner. Let / : [l,A^/3] ^ R be a non¬ 
negative function bounded in magnitude by 1 , and let ry be a parameter to be 
chosen later. If / — Ex<a;<jv/ 3 /(a;) is not linearly pseudorandom, in the sense that 
II/—Ei< 3 ,< 7 v/ 3 /(a^)||c/ 2 (z/Arz) > 17 , then we apply Lemma rOl to obtain a correlation 
with a linear phase </. An easy application of the Dirichlet approximation theorem 
then shows that one can partition [1, fV/3] into arithmetic progressions (of length 
roughly rf‘'/N) on which (j) is essentially constant (fluctuating by at most ly^/lOO, 
say). A pigeonhole argument (exploiting the fact that f — ^i<x<N/ 3 f{x) has mean 
zero) then shows that on one of these progressions, say P, f has significantly higher 
density than on the average, in the sense that Fix^pfix) > P^xez/Nzfi^) + ri‘^/^OO- 
One can then apply an affine transformation to convert this progression P into an¬ 
other discrete interval {!,..., A^'/3}, where N' is essentially the square root of N. 
One then iterates this argument until linear pseudorandomness is obtained (using 
the fact that the density of / cannot increase beyond 1 ), and one eventually obtains 

Theorem 3.3 (Structure theorem). Let f : [1, A^/3] —s- R 6 e a non-negative func¬ 
tion bounded by 1, and let 77 > 0. Then there exists a progression P in [l,A^/3] 
of length at least c{r])N‘^^'^^ for some c{r/) > 0, on which we have the splitting 
f = fur + fu, where f(f := Ex(zpf{x) > Ei<x<N/ 3 f{x) the mean of f on P, 
and fu is linearly pseudorandom in the sense that 

ll/(7||c/2(Z/MZ) < V 

where we identify P with a subset of a cyclic group Z/MZ of cardinality M 3|P| 
in the usual manner. 

More informally, any function will contain an arithmetic progression P of signif¬ 
icant size on which / can be decomposed into a non-trivial structured component 
fi/i. and a pseudorandom component fu- In the language of the introduction, it is 
essentially saying that any dense set A of integers will contain components which 
are dense pseudorandom subsets of long progressions. Once one has this theorem, 
it is an easy matter to establish Szemeredi’s theorem in the k = S case. Indeed, if 
A C Z has upper density greater than 6, then we can find arbitrarily large primes 
N such that |An [1, Af/3]| > 5N/3. Applying Theorem 13.31 with rj := 5^/100, and 
/ equal to the indicator function of A n [1, A^/S], we can find a progression P in 
{!,..., A^/3} of length at least c(6)N^^^1 on which 'Exepfiw) > S and f — 'Exepfix) 
is linearly pseudorandom in the sense of Theorem 13.31 It is then an easy matter 
to apply the generalized von Neumann theorem to show that An P contains many 
arithmetic progressions of length three (in fact it contains ^ 5^|Pp such progres¬ 
sions). Letting N (and hence |P|) tend to infinity we obtain Szemeredi’s theorem 
in the fc = 3 case. An averaging argument of Varnavides m then yields the more 
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quantitative version in Theorem I.S.Il ihut with a moderately bad bound for c(3,(5), 

C/5C 

namely c(3,(5) = 2“^ for some absolute constant C). 

A more refined structure theorem was given in |23| (see also ESI), which was 
termed an “arithmetic regularity lemma” in analogy with the Szemeredi regularity 
lemma which we discuss in the next section. That theorem has similar hypotheses 
to Theorem 13.31 but instead of constructing a single progression on P on which one 
has pseudorandomness, one partitions [1, A^/3] into many long progressions^, where 
on most of which the function / becomes linearly pseudorandom (after subtracting 
the mean). A related structure theorem (with a more “ergodic” perspective) was 
also given in m Here we give an alternate approach based on Fourier expansion 
and the pigeonhole principle. Observe that for any / : Z/NZ C and any thresh¬ 
old A we have the Fourier decomposition / = fjj± + fu, where the “structured” 
component f^± := X ]4 |/({)|>a contains all the significant Fourier co¬ 
efficients, and the “pseudorandom” component fu := X]{-|/({)|<a con¬ 

tains all the small Fourier coefficients. Using Plancherel’s theorem one can easily 
establish 

Theorem 3.4 (Weak structure theorem). Let f : Z/NZ ^ C be a function bounded 
in magnitude by 1, and let 0 < X < 1. Then we can split f = fu± -f fu, where fu^ 
is the linear combination of at most 0{1/X^) linear phase functions x 1 -^ ^ 

and fu is linearly pseudorandom in the sense that \\fu\\u^{z/NZ) ^ '''■ 

This theorem asserts that an arbitrary bounded function only has a bounded 
amount of significant linear Fourier-analytic structure; after removing this bounded 
amount of structure, the remainder is linearly pseudorandom. 

This theorem, while simple to state and prove, has two weaknesses which make 
it unsuitable for such tasks as counting progressions of length three. Firstly, even 
though / is bounded by 1, the components fu^ , fu need not be. Related to this, if / 
is non-negative, there is no reason why fu± should be non-negative also. Secondly, 
the pseudorandomness control on fu is not very good when compared against the 
complexity of fu^ (i.e. the number of linear exponentials needed to describe fu-^)- 
In practice, this means that any control one obtains on the structured component 
of / will be dominated by the errors one has to concede from the pseudorandom 
component. Fortunately, both of these defects can be repaired, the former by a Fejer 
summation argument, and the latter by a pigeonhole argument (which introduces 
a second error term fs, which is small in norm). More precisely, we have 

Theorem 3.5 (Strong structure theorem). Let f : Z/NZ — > R 6e a non-negative 
function bounded by 1, and let 0 < e < 1. Let F : ISl N be an arbitrary 
increasing function (e.g. F{n) = 2^ ). Then there exists an integer T = Oi?_£(l) 
and a decomposition f = fux + fs + fu, where fu^ is the linear combination of 
at most T linear phase functions, fu is linearly pseudorandom in the sense that 
WfuWv^iz/NZ) = 0{l/F{/T)), and fs is small in the sense that ||/s||l2(z/az) := 
(E„gz/Arzl/s(^)P)^^^ = 0{e). Furthermore, fux,fu are bounded in magnitude by 
1. Also, fur and fur + fs ore non-negative with the same mean as f. 

Proof. We use an argument from m We may take e = 1 /M for some large integer 
M. Let Ni, N 2 ,... ,Nm 2+2 be defined recursively by A^i := M and A^m+i := 

'^Actually, for technical reasons it is more efficient to replace the notion of an arithmetic 
progression by a slightly different object known as a Bohr set; see 1 ^ . EH for details. 
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F{G{Nm))'^, where G : N ^ N is a function depending on e that we shall choose 
later. From Plancherel’s theorem we have 

E 1/(01" <1 

{ez/ATZ 

and hence by the pigeonhole principle we can find 1 < m < such that 

E \fm"<VM" = 0{e^). 

l/Nm + 2<\f{i)\<i/Nm 

Now, for each 1 < m < we define a Fejer-like kernel : Z/NZ R+ 

which is non-negative, has mean one, has Fourier coefficients 1 -f 0(e) for all ^ 
with |/(^)| > 1/Nm, and is a linear combination of at most ONm,ei^) linear phase 
functions. Such a function can be constructed in a “hard” manner by means of 
Riesz products, or in a more “soft” manner by using the Weierstrass approximation 
theorem; we omit the details. If we then set 

/c;r:=/*iFW; := f - f ^ 

with T equal to the number of linear phase functions comprising K^'^\ then by 
repeated use of Plancherel’s theorem one can verify all the required properties (if 
the function G is chosen sufficiently fast growing, depending on e). □ 

Note that we have the freedom to set the growth function F arbitrarily fast in the 
above proposition; this corresponds roughly speaking to the fact that in the ergodic 
counterpart to this structure theorem fProposition l2.4|l the pseudorandom error fjj 
has asymptotically vanishing Gowers U'^ norm. One can view f^± as a “coarse” 
Fourier approximation to /, and /j/j- + fs as a “fine” Fourier approximation to /; 
this perspective links this proposition with the graph regularity lemmas that we 
discuss in the next section. 

Theorem id.bi can be used to deduce the structure theorems in pni, innii innii while 
a closely related result was also established in It can also be used to directly 
derive the fc = 3 case of Theorem rm as follows. Let / be as in that proposition, 
and let e := <5^/100. We apply Theorem 13.51 to decompose / = fij± + fs + fu- 
Because fij± has only T Fourier exponentials, it is easy to see that fu± is almost 
periodic, in the sense that \\T^fu^ — fu^\\ l^{z/nz) < e for at least c(e, T)N values 
of n G Z/NZ, for some c{e, T) > 0. For such values of n, one can easily verify that 

^xGZ/Nzfu-^i^)T^fu-^ix) > E^gz/wz/^i—3e > i^xGZ/Nzfu-^)^—^^ > S^/2. 

Because fs is small, we can also deduce that 

^xez/Nzifu^ + fs){x)T-{fu^ + + fs){x) > 6^4 

for these values of n. Averaging in n (and taking advantage of the non-negativity 
of fu^ F fs) we conclude that 

^x,n^z/Nz{fu^ + fs){x)T^{fu^ + fs){x)T^^{fu^ + fs){x) > 5^c{e,T)/A. 

Adding in the pseudorandom error fu using the generalized von Neumann theorem 
o, we conclude that 

^x,nez/Nzf{x)T^fix)T^'^f{x) > S^cie,T)/4-Oil/F{T)). 
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If we choose F to be sufficiently rapidly growing depending on 5 and e, we can 
absorb the error term in the main term and conclude that 

E,,„6z/Az/(cc)r”/{x)r2-/(x) > S^c{e,T)/8. 

Since T = Op^ei^) = 0^(1), we obtain the k = 3 case of Theorem Id. II as desired. 

Roth’s original Fourier-analytic argument was published in 1953. But the ex¬ 
tension of this Fourier argument to the k > 3 case was not achieved until the work 
of Gowers m, m in 1998. For simplicity we once again restrict attention to the 
k = A case, where the theory is more complete. Our objective is to show 

(8) E,^rez/Nzf{x)T^f{x)T^^f{x)T^^f{x) > c(4, <5) > 0 

whenever / is non-negative, bounded by 1, and has mean at least S. There are 
some significant differences between this case and the k = 3 case ©■ Firstly, 
linear pseudorandomness is not enough to guarantee that a contribution to m is 
negligible: for instance, if f{x) := jN^ then 

^.p^ziNz]{x)Tq\x)T‘^^f{x)T^~!{x) = 1 

despite / being very linearly pseudorandom (the C/^ norm of / is com¬ 

pare this example with 0 . One must now utilize some sort of “quadratic Fourier 
analysis” in order to capture the correct concept of pseudorandomness and struc¬ 
ture. Secondly, the Fourier-analytic arguments must now be supplemented by some 
results from arithmetic combinatorics (notably the Balog-Szemeredi theorem, and 
results related to Freiman’s inverse sumset theorem) in order to obtain a usable no¬ 
tion of quadratic structure. Finally, as in the ergodic case, one cannot rely purely on 
quadratic phase functions such as +v^)/n .(.q generate all the relevant struc¬ 

tured objects, and must also consider generalized quadratic objects such as locally 
quadratic phase functions, 2-step nilsequences (see below), or bracket quadratic 
phases such as 

Let us now briefly sketch how the theory works in the fc = 4 case. The correct 
notion of pseudorandomness is now given by the Gowers uniformity norm, 
defined by 

\\f\\u^{Z/NZ) •“ ^neZ/Nz\\T^ff\\u^{Z/NZ)- 

This norm measures the extent to which / behaves quadratically; for instance, if 
/ = g 2 TriP(x)/N some polynomial P of degree k in the finite field Z/NZ, then 
one can verify that ||/||(73(z/7vz) = 1 if P has degree at most 2, but (using the Weil 
estimates) we have |j/||(73(z/AfZ) = if P has degree k > 2. Repeated 

application of Cauchy-Schwarz then yields the generalized von Neumann theorem 

(9) \E,,rez/Nzfo{x)T^fi{x)T^^hix)T^^Mx)\ < min Wf^Wu^z/NZ) 

whenever fo, fi, f 2 , /a are bounded in magnitude by 1. The next step is to establish 
a dichotomy between quadratic structure and quadratic pseudorandomness in the 
spirit of Lemma 13.21 In the original work of Gowers d, it was shown that a func¬ 
tion which was not quadratically pseudorandom had local correlation with quadratic 
phases on medium-length arithmetic progressions. This result (when combined with 
the density increment argument of Roth) was already enough to prove ® with a 
reasonable bound on c(4, 5) (basically of the form 1/ exp(exp(^“‘" ))); see dl> EOI- 
Building upon this work, a stronger dichotomy, similar in spirit to Lemma 12.51 
was established in dl- Here, a number of essentially equivalent formulations of 
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quadratic structure were established, but the easiest to state (and the one which 
generalizes most easily to higher k) is that of a (basic) 2-step nilsequence, which 
can be viewed as a notion of “quadratic almost periodicity” for sequences. More 
precisely, a 2-step nilsequence a sequence of the form n F{T‘^xT), where F is a 
Lipschitz function on a 2-step nilmanifold G/F, xT is a point in this nilmanifold, 
and T is a shift operator T : xT i—> axT for some hxed group element a € G. 
We remark that quadratic phase sequences such as n i—> are examples of 

2 -step nilsequences, and generalized quadratics such as n ^2m[V2n]V3n 
be written (outside of sets of arbitrarily small density) as 2 -step nilsequences. 

Lemma 3.6 (Dichotomy between randomness and structure). | 2 t)| Suppose that 
f : Z/NZ ^ C is bounded in magnitude by 1 with ||/||(73(z/Arz) > V for some 
0 < 77 < 1. Then there exists a 2-step nilsequence n i-^- FlT’^xT), where G/T 
is a nilmanifold of dimension 0^(1), and F is a bounded Lipschitz function G/T 
with Lipschitz constant 0,^(1), such that |Ei< 2 ;<Ar/(a;)F(T"xr)| > 0 ( 77 ) for some 
cfq) > 1. (We identify the integers from 1 to N with Z/NZ in the usual manner.) 

In fact the nilmanifold G/T constructed in is of a very explicit form, being the 
direct sum of at most 0 ,,( 1 ) circles (which are one-dimensional), skew shifts (which 
are two-dimensional), and Heisenberg nilmanifolds (which are three-dimensional). 
The dimension Or;(l) is in fact known to be polynomial in 77, but the best bounds 
for c{r]) are currently only exponential in nature. See m for further details and 
discussion. 

The proof of Lemma EH is rather lengthy but can be summarized as follows. 
If / has large norm, then by dehnition T^ff has large norm for many n. 
Applying Lemma f.‘1.21 this shows that for many n, T^ff correlates with a linear 
phase function of some frequency ^{n) (which can be viewed as a kind of “derivative” 
of the phase of / in the “direction” n). Some manipulations involving the Cauchy- 
Schwarz inequality then show that (,{n) contains some additive structure (in that 
there are many quadruples 711,772,713,714 with 771-1-772 = 773 -I- 774 and £,{ni) -I- 
^( 772 ) = ^( 773 ) -I- 5 (^ 4 ))- Methods from additive combinatorics (notably the Balog- 
Szemeredi(-Gowers) theorem and Freiman’s theorem, see e.g. m) are then used 
to “linearize” in the sense that ^( 77 ) agrees with a (generalized) linear function 
of 77 on a large (generalized) arithmetic progression. One then “integrates” this 
fact to conclude that / itself correlates with a certain “anti-derivative” of £,{n), 
which is a (generalized) quadratic function on this progression. This in turn can 
be approximated by a 2-step nilsequence. For full details, see m- 

Thus, quadratic nilsequences are the only obstruction to a function being quadrat- 
ically pseudorandom. This can be iterated to obtain structural results. The follow¬ 
ing “weak” structural theorem is already quite useful: 

Theorem 3.7 (Weak structure theorem). Ei Let f : Z/NZ —s- C &e a function 
bounded in magnitude by I, and let 0 < X < 1. Then we can split f = fu± + 
fu, where fjj±. is a 2-step nilsequence given by a nilmanifold of dimension Oa(1) 
and by a bounded Lipschitz function F with Lipschitz constant 0^(1), and fu is 
quadratically pseudorandom in the sense that \\fu\\u^{z/NZ) ^ Furthermore, 
fur is non-negative, bounded by I, and has the same mean as f. 

This is an analogue of Theorem 13.41 and asserts that any bounded function 
has only a bounded amount of quadratic structure, with the function becoming 
quadratically pseudorandom once this structure is subtracted. It cannot be proven 
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in quite the same way as in Theorem liOl because we have no “quadratic Fourier 
inversion formula” that decomposes a function neatly into quadratic components 
(the problem being that there are so many quadratic objects that such a formula is 
necessarily overdetermined). However, one can proceed by a finitary analogue of the 
ergodic theory approach, known as an “energy increment argument”. In the ergodic 
setting, one uses all the quadratic objects to create a tr-algebra Z 2 , and sets fij± to 
be the conditional expectation of / with respect to that cr-algebra. In the finitary 
setting, it turns out to be too expensive to try to use all the 2-step nilsequences to 
create a tr-algebra. However, by adopting a more adaptive approach, selecting only 
those 2-step nilsequences which have some significant correlation with / (or some 
component of /), one can obtain the above theorem as follows. 

Proof. (Sketch) We perform the following iteration procedure. Initialize Z to be the 
trivial tr-algebra {0, Ti/NZ}. If / — E(/|Z) is already quadratically pseudorandom, 
then stop the iteration. Otherwise, using Lemma|^Slwe know that / —E(/|Z) cor¬ 
relates with some 2-step nilsequence g{n) = FiT'^xr). We take the level sets of g 
(suitably discretized) and add them to the tr-algebra Z; the correlation of /— E(/|Z) 
with g ensures that the energy ||E(/|Z )||^2 will increase significantly (by some 
amount c{r]) > 0) when doing so; this is essentially Pythagoras’ theorem. Because 
/ is bounded by 1, the energy cannot exceed 1, and so the iteration will stop after 
Orj{l) steps. When one does this, one obtains a splitting / = E(/|Z)-|-(/—E(/|Z)), 
where / — E(/|Z) is quadratically pseudorandom, and E(/|Z) is the conditional 
expectation of / with respect to a bounded number of 2-step nilsequences. By 
applications of Urysohn’s lemma, the Weierstrass approximation theorem, and the 
fact that any polynomial combination of 2-step nilsequences is again a 2-step nilse¬ 
quence, we can approximate E(/|Z) to arbitrary accuracy by a 2-step nilsequence 
fij± of bounded complexity; by being careful one can also ensure that fij± remains 
non-negative and bounded by I. Setting fu ■= f — fu^ obtains the claim. □ 

It is likely that quantitative versions of this structure theorem will improve the 
known bounds on Szemeredi’s theorem in the fc = 4 case; see 122, |S2> EH A 
closely related version of this argument was also essential in establishing Theorem 
o see Section El below. 


4. Graph theory 

We now turn to the third major line of attack to Szemeredi’s theorem, based on 
graph theory (and hypergraph theory), and which is perhaps the purest embodiment 
of the strategy of exploiting the dichotomy between randomness and structure. For 
graphs, the relevant structure theorem is the Szemeredi regularity lemma, which 
was developed in |58| in the original proof of Szemeredi’s theorem, and has since 
proven to have many further applications in graph theory and computer science; see 
m for a survey. More recently, the analogous regularity lemma for hypergraphs 
have been developed in EH, 02, II2> EHl: 02: 02 Roughly speaking, these very 
useful lemmas assert that any graph (binary relation) or hypergraph (higher order 
relation), no matter how complex, can be modelled effectively as a pseudorandom 
sub(hyper)graph of a finite complexity (hyper)graph. Returning to the setting 
of the introduction, the graph regularity lemma would assert that there exists a 
colouring of the integers into finitely many colours such that relations such as x—y € 
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A can be viewed approximately as pseudorandom relations, with the “probability” 
of the event x — y G A depending only on the colour of x and y. 

The strategy of the graph theory approach is to abstract away the arithmetic 
structure in Szemeredi’s theorem, converting the problem to one of finding solutions 
to an abstract set of equations, which can be modeled by graphs or hypergraphs. As 
before, we first illustrate this with the simple case of the k = 3 case of Szemeredi’s 
theorem, which we will take in the form of Theorem LS.f I For simplicity we specialize 
to the case when / is the indicator function of a set A (which thus has density at 
least S in h/NTi)-, it is easy to see (e.g. by probabilistic arguments) that this special 
case in fact implies the general case. The key observation is that the problem of 
locating an arithmetic progression of length three can be recast as the problem of 
solving three constraints in three unknowns, where each constraint only involves 
two of the unknowns. Specifically, if x,y,z G Z/NZ solve the system of constraints 

y +2z G A 

(10) —X +z G A 

—2x —y G A 

then y + 2z, —x + z, —2x — i/ is an arithmetic progression of length three in A. 

Conversely, each such progression comes from exactly N solutions to HI OH . Thus, 
it will suffice to show that there are at least c{3,5)N^ solutions to 11 OH . Note 
that we already can construct at least SN'^ “trivial solutions” to 11 OH . in which 
y + 2z = —X + z = —2x + y is an element of A. Furthermore, these trivial solutions 
{x,y,z) are “edge-disjoint” in the sense that no two of these solutions share more 
than one value in common (i.e. if {x, y, z) and {x', y', z') are distinct trivial solutions 
then at most one oi x = x', y = y', z = z' are true). It turns out that these trivial 
solutions automatically generate a large number of non-trivial solutions to cm - 
without using any further arithmetic structure present in these constraints. Indeed, 
the claim now follows from the following graph-theoretical statement. 

Lemma 4.1 (Triangle removal lemma). m For every 0 < 5 < I there exists 
0 < (T < I with the following property. Let G = {V,E) be an (undirected) graph 
with \ V\ = N vertices which contains fewer than aN^ triangles. Then it is possible 
to remove 0{SN'^) edges from G to create a graph G' which contains no triangles 
whatsoever. 

To see how the triangle removal lemma implies the claim, consider a vertex set 
V which consists of three copies Fi,V 2 ,V 3 of Z/NZ (so \V\ = 3N), and consider 
the tripartite graph G = {V, E) whose edges are of the form 

E = {(y,z) G V 2 XV 3 : y+2z G A}U{{x,z) G VixVJj : —x+z G A}\j{{x,y) G V 1 XV 2 : —2x—y G A}. 

One can think of G as a variant of the Cayley graph for A. Observe that solutions 
to m are in one-to-one correspondence with triangles in G. Furthermore, the 
trivial solutions to m correspond to edge-disjoint triangles in G. Thus to 
delete all the triangles one needs to remove at least edges. Applying Lemma 
I4.1l in the contrapositive (adjusting N, S, a by constants such as 3 if necessary), we 
see that G contains at least aN^ triangles for some a = a{S) > 0, and the claim 
follows. 

The only known proof of the triangle removal lemma proceeds by a structure 
theorem for graphs known as the Szemeredi regularity lemma. In order to emphasize 
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the similarities between this approach and the previously discussed approaches, we 
shall not use the standard formulation of this lemma, but instead use a more recent 
formulation from IHT], PI (see also n, ESI), which replaces graphs with functions, 
and then obtains a structure theorem decomposing such functions into a structured 
(finite complexity) component, a small component, and a pseudorandom (regular) 
component. More precisely, we work with functions f : V x V R; this can be 
thought of as a weighted, directed generalization of a graph on V in which every 
edge {x,y) is assigned a real-valued weight f{x,y). The first step is to define a 
notion of pseudorandomness. For graphs, this concept is well understood. There 
are many equivalent formulations of this concept (see El), but we shall adopt one 
particularly close to the analogous concepts in previous sections, by introducing 
the Gowers cube norm as 

11/11^2 Ea:,y,x',y'&vf{x,y)f{x,y')f{x\y)f{x',yy, 

when / is the incidence function of a graph, the right-hand side essentially counts 
the number of 4-cycles in that graph. Again, one can use the Cauchy-Schwarz 
inequality to establish that the norm is indeed a norm; alternatively, one can 
use spectral theory and observe that the norm is essentially the Schatten-von 
Neumann p-norm of / with p = 4. We refer to / as pseudorandom if its 
norm is small. By two applications of Cauchy-Schwarz we have the generalized von 
Neumann inequality 

(11) \Ex,y,zGvf{x,y)g{y,z)h{z,x)\ < min(||/||n 2 , || 5 |ln 2 , ||/i||n 2 ) 

whenever f,g,h are bounded in magnitude by 1 (note that this generalizes I0)- 
The next step, as before, is to establish a dichotomy between pseudorandomness 
and structure. The analogue of Lemma [2.21 or Lemma is 

Lemma 4.2 (Dichotomy between randomness and structure). Suppose that f : 
V xV —> R is bounded in magnitude by 1 with ||/||□2(z/A^z) ^ V for some 0 < rj < 1. 
Then there exists sets A,BcV such that \'Ex,y£vf{x,y)lA{x) 1b( y)I > ?7'‘/4. Here 
lA(a:) denotes the indicator function of A (thus 1 ^( 2 ;) = 1 if x G A and 1a{x) = 0 
otherwise). 

Proof. By the definition of and the pigeonhole principle, one can find xf y' such 
that 

\^x,y(^vf{x,y)f[x,y')f{x\y)f{x',y')\ > rj^. 

By splitting /(x, y') and f{x', y) into positive and negative parts, we conclude that 
there exist non-negative functions a{x),b{y) bounded by 1 such that 

\'Ex,yGvf{x,y)a{x)b{y)\ > 

Now letting A, B be random subsets of V, with x G A and y G B holding with 
independent probabilities a{x) and b{y) respectively. From linearity of expectation 
we see that the expected value of 'Ex,y^vf{xjy)^A{x)^B{y) has magnitude at least 
rj‘^/4:, and the claim follows. □ 

One can iterate this to obtain a weak version of the Szemeredi regularity lemma: 

Theorem 4.3 (Weak structure theorem). EOI Let f : VxV ^ R be a non-negative 
function bounded by 1, and let e > 0. Then we can decompose f = fjjr fu, where 
fij± = E(/|Z (g) Z), Z is a a-algebra of V generated by at most 2/e sets, and 

II/bIId^ < £• 
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Proof. (Sketch) We perform the following “energy increment argument” iteration, 
as in Theorem o Initialize Z to be the trivial a-algebra {0,1^} on V, thus the 
tensor product Z Z) Z is the trivial cr-algebra on V x V. If / — E(/|Z Z) Z) has 
a norm less than e, stop the iteration. Otherwise, use Lemma o to find sets 
A, B such that lA{x)lB{y) correlates with / — E(/|Z Z Z). One then adds A and 
B to the cr-algebra Z; the correlation of / — E(/|Z) with lA{x)lB{y) ensures that 
the energy ||E(/|Z Z Z )|||2 will increase significantly (by some amount c{j]) > 0) 
when doing so; this is essentially Pythagoras’ theorem. Because / is bounded by 
1, the energy cannot exceed 1, and so the iteration will stop after 0,,(1) steps. 

When one does this, one obtains the desired splitting with fu ■= f — E(/|Z) and 
fur := E(/|Z). □ 

As with Theorem liOl the above theorem is too weak to be of much use, becase 
the control one has on the pseudorandomness of fu is fairly poor compared to the 
control on the complexity of fur. The following strong version of the regularity 
lemma is far more useful (compare with Theorem III.511 : 

Theorem 4.4 (Strong structure theorem). Let f : V x V ^ II be a non¬ 
negative function bounded by 1, and let e > 0. Let E : N —> N be an arbitrary 
increasing function (e.g. F{n) = 2^ ). Then there exists an integer T = Op.ei^) 
and a decomposition f = fur+fs + fu, where fur = 'E{f\ZzZ), Z is generated by 
at most T sets in V , fu is pseudorandom in the sense that ||/c/||n 2 = 0{1/F{T)), 
and fs is small in the sense that \\fs\\L^(VxV) ■= {Fix,yev\fs{x,y)\'^y^^ = 0{e). 

Furthermore, fur,fu xre bounded in magnitude by 1. Also, fur and fur fs are 
non-negative and bounded by 1. 

Proof. (Sketch) We repeat the energy increment argument from Theorem 14.HI but 
supplement it with an application of the pigeonhole principle. Construct a sequence 
_2l(o) C C ... of CT-algebras on V, with Z^O) being the trivial algebra, and each 
formed by adding two sets A, B to Z^") in such a way as to maximize 
the energy En+i := ||E(/|Z("+^^ 0 Z("+^^)||^ 2 - From Pythagoras’s theorem we 
see that the E„ are increasing, but are also bounded between 0 and 1. From the 
pigeonhole principle^, one can thus find a positive integer n = Op.ei^) such that 
En+F{ 2 n)‘^+i < En "k A further application of the pigeonhole principle then 
allows us to find n<n' < n-\-F{2n)‘^ such that En'+i < E„/ -k 1/E(2^”)'^. We now 
set 

fur := E(/|Z(")®Z(")); fs := E(/|Z("')®Z(”'))-E(/|Z(")®Z(”)); fu := /-E(/|Z("')®Z(”')) 

and Z := Z("). Since En- < E„ -k we see from Pythagoras’ theorem that fs 
has an L^ norm of 0{e). Finally, since En'+i < En- -k 1/E(2n)'*, the arguments in 
Theorem 14 . HI sfive H/ullns = 0(1/E(2n)). Setting T := 2n we obtain the claim. □ 

We remark that one could also prove Theorem I4.4l hv a technique more similar to 
that used to prove Theorem lH . 5l bv viewing / as a matrix and using its singular value 
decomposition (or eigenvalue decomposition, if / is symmetric) as a substitute for 
the Fourier inversion formula. We omit the details. One can view fur as a “coarse” 

®Here we are exploiting a finitary version of the well-known fact that every bounded monotone 
sequence is convergent. The finitary version is that if En is an increasing sequence bounded above 
by 1, e > 0, and T : N —> N, then there exists n = Of,£( 1) such that Enj,.F{n) “L + c. 

This follows by defining a sequence ni,n 2 ,... recursively by ni := 1 and tuai ■= ru -\- F{ni) and 
observing from the pigeonhole principle that En,_,_„ < En^ -t e for some i = 0{l/e). 
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approximation to /, as it is measurable with respect to a fairly low-complexity 
cr-algebra, and /j/r - 1-/5 = E(/|Z("') (g) as a “fine” approximation to /, 

which is considerably more complex but is also a far better approximation to /, in 
fact the accuracy of the fine approximation exceeds the complexity of the coarse 
approximation by any specified growth function F. Also the difference between the 
coarse and fine approximations is controlled by an arbitrarily smal constant e. 

Theorem 10 already easily implies the Szemeredi regularity lemma in its tradi¬ 
tional formulation; see m It also implies Lemma E tI similar to how Theorem ld.51 
implies the k = 3 version of Szemeredi’s theorem. We sketch the proof as follows. 
Set / to be the indicator function of G, thus 

(12) y)f{y, z)f{z, x) < a. 

Apply Theorem 14.41 to obtain a decomposition / = fu± + fs + fu, where F and c 
are to be chosen later. The u-algebra Z is generated by at most T sets, and thus 
has at most 2^ atoms. We now use this decomposition to remove some “irregular” 
components of G. First we remove from G all edges with at least one vertex lying 
in an atom which is “small” in the sense that its cardinality is less than 6N/2'^; this 
costs us at most 0{6N‘^) edges. We also remove from G all edges connecting a pair 
of atoms A, B on which fs is “large” in the sense that E^, eA,yeB\fsix,y)\'^ > 
this also costs us at most 0{5N‘^) edges. Finally, we remove from G all edges 
connecting a pair of atoms A, B on which fij± is smaller than S (or equivalently, 
E,, GA,yGBf{x,y) < 6)', this also costs us 0{SN^) edges. After all these removals, 
the only pairs of atoms A, B which still contribute to the reduced graph G" are 
those which are large (so that |A|, \B\ > 5N/OA'), on which fjj± is larger than S, 
and on which |/sp has mean less than e^/5. Let us call such pairs (A, B) “good”. 

Now suppose that this reduced graph G' still contains at least one triangle. Then 
there must be three atoms A, B, C such that all three pairs (A, B), {B, G), {G, A) 
are good. In particular from the largeness of fij± we have 

^x(^A,y(^B,z(^cfu^{x,y)fu^{y,z)fu^iz,x) > 6^ 

and then by the smallness of fs we have 

F,^eA,yeB,zecifu^ + fs)ix,y){fu± + fs)iy, z)ifu^ + fs){z,x) > - 0{e^/5) 

and thus by the largeness of A, B, C and the non-negativity of fu± + fs 

'E^^yAGvifu^ + fs){x,y){fu^ + fs){y,z){fu^ + fs){z,x) > [5^ - 0{e^//2^^. 

Now by by the generalized von Neumann theorem dJ and the pseudorandomness 
of fu we have 

E,,,,,eR/(cr, y)f{y, z)f{z, x) > [< 5 ^ _ 0{eyS)]6y2^^ - 0 ( 1 /E(r)). 

If we choose £ to be a small multiple of (5^, and F{T) to be a large multiple of 
2^^/S^, we thus have 

F,^^y^^(zvf{x,y)f{y,z)f{z,x) > 12.^'^ > c{5) 

for some c{5) > 0 (since T = OF,e{^) = ^^(l)). This will contradict itT^ if a is 
sufficiently small. Thus G' does not contain any triangles, and we are done. 

As in the other two approaches, the above arguments extend (with some addi¬ 
tional difficulties) to higher values of k. Again we restrict attention to the fc = 4 
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case for simplicity. To locate a progression of length four in a set A C Z/NZ is 
now equivalent to solving the system of constraints 

y +2z +3u> € A 

+z +2w G A 

—y +w G A 

—2y —z G A. 

This in turn follows from a hypergraph analogue of the triangle removal lemma. 

Define a i-uniform hypergraph to be a pair H = (V,E) where is a finite set of 
vertices and E is a finite set of unordered triplets (x, y, z) in V, which we refer to as 
the edges of H. Define a tetrahedron in 77 to be a quadruple {x,y,z,w) of vertices 
such that all four triplets (a;, y, z), (y, z, w), {z, w, x), (w, x, y) are edges of El. 

Lemma 4.5 (Tetrahedron removal lemma). [HI For every 0 < d < 1 there exists 
0 < (T < 1 with the following property. Let H = {V, E) be a 3-uniform hypergraph 
graph with \V\ = N vertices which contains fewer than tetrahedra. Then it is 
possible to remove 0{SN^) edges from 77 to create a hypergraph 77' which contains 
no tetrahedra whatsoever. 


(13) 


—x 

—2x 

—3x 


Letting / be the indicator function of 77, we now have a situation where 


Ea:,y,z,wevf(x, V, z) f {y, z, w)f{z, w, x)f{w, x,y) <cr 


and we need to remove some small components from / so that this average now 
vanishes completely. Again, the key step here is to obtain a structure theorem that 
decomposes / into structured parts, small errors, and pseudorandom errors. The 
notion of pseudorandomness is now captured by the Gowers cube norm, defined 

by 


Ins := y, z)f{x, y, z')f{x, y', z)f{x, y', z')f{x', y, z)f(x', y, z')f{x', y', z)f{x', y', z') 


which in the case when / is the indicator function of a hypergraph 77, is essen¬ 
tially counting the number of octahedra present in 77. One can obtain a strong 
structure theorem analogous to Theorem lO but with one significant difference. 
In Theorem 14.41 the structured component fu± (x, y) can be broken up into a small 
number of components which are of the form l, 4 (x)lB(y). In the 3-uniform hy¬ 
pergraph analogue of Theorem 14.41 the structured component fu±{x,y, z) will be 
broken up into a small number of components of the form 1 ^( 2 ;, y)lB(y, z)lc{z, x). 
It turns out that in order to conclude the proof of Lemma 101 this structural de¬ 
composition is not sufficient by itself; one must also turn to the functions lAix,y), 
Isiy, z), Iciz, x) generated by this structure theorem and decompose them further, 
essentially by invoking Theorem 14.41 This leads to some technical complications 
in the argument, although this approach to Szemeredi’s theorem is still the most 
elementary and self-contained. See EH, ESI, EZI> I1H1> EH- ED for details. 


5. The primes 

Having surveyed the three major approaches to Szemeredi’s theorem, we now 
turn to the question of counting progressions in the primes (or in dense subsets 
of the primes). The major new difficulty here, of course, is that the primes have 
asymptotically zero density rather than positive density, and even the most recent 
quantitative bounds on Szemeredi’s theorem (see the discussion after Theorem 13.111 
are not strong enough by themselves to overcome the “thinness” of the primes. 
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However, it turns out that the primes (and functions supported on the primes) 
are still within the range of applicability of structure theorems. For instance, to 
oversimplify dramatically, the structure theorem in m essentially® represents the 
primes (or any dense subset of the primes) as a (sparse) pseudorandom subset of a 
set of positive density. Since sets of positive density already contain many progres¬ 
sions thanks to Szemeredi’s theorem, it turns out that enough of these progressions 
survive when passing to a pseudorandom subset that one can conclude Theorem 

EH 

Interestingly, Theorem 11.21 can be tackled by (quantitative) ergodic methods, 
by Fourier-analytic methods, and by graph-theoretic methods, with the three ap¬ 
proaches leading to slightly different results. For instance, the establishment of 
infinitely many progressions of length three in the primes by van der Corput m 
was Fourier-analytic, as was the corresponding statement for dense subsets of the 
primes (i.e. the fc = 3 case of Theorem EU, proven 76 years later by Green |22|. 
The argument in m which proves Theorem o in full combines ideas from all 
three approaches, but is closest in spirit to the ergodic approach, albeit set in the 
finitary context of a cyclic group 7i/NZ rather than on an infinitary measure space. 
The argument in which shows that the Gaussian primes (or any dense sub¬ 
set thereof) contains infinitely many constellations of any prescribed shape, and 
can be viewed as a two-dimensional analogue of Theorem 11.21 was proven via the 
(hyper)graph-theoretical approach. Finally, a more recent argument in m, EH, 
in which precise asymptotics for the number of progressions of length four in the 
primes are obtained, as well as a “quadratic pseudorandomness” estimate on a 
renormalized counting function for the primes, proceeds by returning back to the 
original Fourier-analytic approach, but now using quadratic Fourier-analytic tools 
( Lemma 13.61 and Theorem EH) rather than linear ones. 

As mentioned in the introduction, these results are discussed in other surveys 
EH, EH, EH, EH, and we will only sketch some highlights here. In all the 
results, the strategy is to try to isolate the “structured” component of the primes 
from the “pseudorandom” component. There is some obvious structure present in 
the primes; for instance, they are almost all odd, they are almost all coprime to 
three, and so forth. This obvious structure can be normalized away fairly easily. 
For instance, to remove the bias the primes have towards being odd, one can replace 
the primes P = {2,3, 5,...} with the renormalized set P 2 ,i '■= {n : 2n + I prime} = 
{1, 2, 3, 5, ...}. Each arithmetic progression in P 2 ,i clearly induces a corresponding 
progression in P, but the set P 2 ,i has no bias modulo 2. More generally, to reduce 
all the bias present in residue classes mod p for a\lp < w (where w is a medium-sized 
parameter to be chosen later), one can work with a set Pw,b '■= {n ■ Wn + b prime}, 
where W is the product of all the primes less than w and 1 < 6 < IF is a number 
coprime to IF. This “IF-trick” allows for some technical simplifications. 


®This is a gross oversimplification. The precise statement is that after eliminating obvious 
irregularities in the primes caused by small residue classes, and excluding a small and technical 
exceptional set, a normalized counting function on the primes can be decomposed as a bounded 
function (which is thus spread out over a set of positive density), plus a pseudorandom error. 
Ignoring the initial elimination of obvious irregularities and the exceptional set, and pretending 
the bounded function was the indicator function of a positive density set A, one recovers the 
interpretation of the primes as a sparse pseudorandom subset of A. 
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Next, it is convenient not to work with the primes as a set, but rather as a 
renormalized counting function. One convenient choice is the von Mangoldt func¬ 
tion A(n), defined as logp if n is a power of a prime p and 0 otherwise. Actually, 
because of the M^-trick, it is better to consider a renormalized von Mangoldt func¬ 
tion such as Aw,b{n) := ■^^^A(kkn -I- 6 ), where is the Euler totient function 

of W. The prime number theorem in arithmetic progressions asserts that the as¬ 
ymptotic average value of Aw,b{n) is equal to 1. To establish progressions of length 
k in the primes, it suffices to obtain a nontrivial lower bound for the asymptotic 
value of the average 

(14) 'Ei<n,r<NAw,bin)Aw,bin -I- r)... Aw,bin + {k — l)r). 

In fact this quantity is conjectured to asymptotically equal 1 as IT, iV ^ oo, with W 
growing much slower than N (a special case of the Hardy-Littlewood prime tuples 
conjecture); the intuition is that by removing all the bias present in the small 
residue classes, we have eliminated all the “obvious” structure in the primes, and 
the renormalized function Aw,b should now fluctuate pseudorandomly around its 
mean value 1. However, this conjecture has only been verified in the cases A: = 3,4 
(leading to an asymptotic count for the number of progressions of primes of length 
k less than a large number N); for the cases fc > 4 we only have a lower bound of 
c{k) for some small c{k) > 0. 

Let us cheat slightly by pretending that Aw,b is a function on the cyclic group 
Zi/N7i rather than on the integers Z; there are some minor technical truncation 
issues that need to be addressed to pass from one to the other but we shall ignore 
them here. In order to show that mi) is close to 1 , an obvious way to proceed would 
be to establish some kind of pseudorandomness control on the deviation Aw,b — 1 
from the mean, and then some sort of generalized von Neumann theorem to show 
that this deviation is negligible. Based on the experience with Szemeredi’s theorem, 
one would expect linear pseudorandomness to be the correct notion for fc = 3, 
quadratic pseudorandomness for k = 4, and so forth. In the fc = 3 case it is indeed 
a standard computation (using Vinogradov’s method, or a modern variant of that 
method such as the one based on Vaughan’s identity) to show that Aw,b — 1 is has 
small Fourier coefficients, which is a reasonable proxy for linear pseudorandomness; 
the point being that the IV-trick has eliminated all the “major arcs” which would 
otherwise destroy the pseudorandomness. It then remains to obtain a generalized 
von Neumann theorem, similar to 0 . In preceding sections, one was working with 
functions that were bounded (and hence square integrable), and one could obtain 
these theorems easily from Plancherel’s theorem. In the current setting, the 
estimates on Aw,b are unfavourable, and what one needs instead is some sort of 
bound on the Fourier coefficients of Aw,b for some 2 < p < 3. This can be done 
by a more careful application of Vinogradov’s method, but can also be achieved 
using harmonic analysis methods arising from restriction theory; see [52] , | 2 H| • The 
key new insight here is that while the Fourier coefficients of Aw,b are difficult to 
understand directly, one can majorize Aw,b pointwise by (a constant multiple of) a 
much better behaved function u of comparable size, whose Fourier coefficients are 
much easier to obtain bounds for (indeed u is essentially linearly pseudorandom 
once one subtracts off its mean, which is essentially 1). This “enveloping sieve” u is 
essentially the Selberg upper bound sieve, and can be viewed as a “smoothed out” 
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version^ of Kw,b- Restriction theory (related to the method of the large sieve) is 
then used to pass from Fourier control of v to Fourier control of Aw,b- 

A similar idea was used in |221, HH] to establish the k = 3 case of Theorem ll.2l 
we sketch the argument from m here as follows. The main objective is to establish 
a lower bound for expressions such as 

(15) Ea;_rGZ/AfzAw,blA(a^)Au/,&lA(a: + r)Aw,b^A{x + 2r) 

for large sets A. Restriction theory still allows us to obtain good P upper bound for 
the Fourier coefficients of Aw,b^A- This functions as a substitute for Plancherel’s 
theorem (which is not favourable here), and one can now obtain structure theorems 
such as Theorem Id.dl (and with some more effort, Theorem 13.51) . This decomposes 
Aw,b^A into some structured component fij± and a linearly pseudorandom compo¬ 
nent fu- The generalized von Neumann theorem lets us dispose the contribution 
of fu to (O, so let us focus on fu^- One can try to use the complexity bound 
on fu± (controlling the number of linear phases that comprise fu ^) to get some 
lower bound here, but this would require developing a strong structure theorem 
analogous to Theorem 13.51 It turns out that one can argue more cheaply, using a 
weaker structure theorem analogous to Theorem 13.41 The key observation is that 
because Aw,b^A is dominated (up to a constant) by the enveloping sieve the 
structured component of Aw.&Ia (which is essentially a convolution of Ay/,b^A with 
a Fejer-like kernel) is pointwise dominated (up to a constant) by a corresponding 
structured component of v. But since v is linearly pseudorandom after subtracting 
off its mean, the structured component of v turns out to essentially be just the 
mean of which is bounded. We conclude that fu^ is bounded, at which point 
one can just apply Szemeredi’s theorem (Theorem 13.11) directly to obtain a good 
lower bound on this contribution to Ha, and one can now conclude the fc = 3 case 
of Theorem o 

The proof of Theorem ll.2l for general k in m follows the same general strategy, 
but it is convenient to abandon the Fourier framework (which becomes quite com¬ 
plicated for fc > 3) and instead take an approach which borrows ingredients from 
all three approaches, especially the ergodic theory approach. From the Fourier 
approach one borrows the Gowers uniformity norms {/^“^(Z/NZ), which are a 
convenient way to define the appropriate notion of pseudorandomness for counting 
progressions of length fc. One still needs an enveloping sieve u, but instead of using 
a Selberg-type sieve that enjoys good Fourier coefficient control, it turns out to be 
more convenient to use an enveloping sieve® of Goldston and Yildirim ca, cEi, ca 
which has good control on fc-point correlations (indeed, it behaves pseudorandomly 
after subtracting off its mean, which is essentially 1). 

The next step is a generalized von Neumann theorem to show that the contribu¬ 
tion of pseudorandom functions are negligible. The fact that the functions involved 
are no longer bounded by 1, but are instead dominated by v, makes this theorem 
somewhat trickier to establish, however it can still be achieved by a number of ap¬ 
plications of the Gauchy-Schwarz and taking advantage of the pseudorandomness 


^What is essentially happening here is that we are viewing the primes not as a zero density 
subset of the integers, but as a positive density subset of a set of “almost primes” which can be 
controlled efficiently via sieve theory. 

®A related enveloping sieve was also used in the recent establishment of narrow gaps in the 
primes m 
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properties of u — 1. This type of argument is inspired by certain “sparse counting 
lemmas” arising from the hypergraph approach, particuarly from m 

The main step, as in previous sections, is a structure theorem which decomposes 
Avu,b (or Avu,f,lyi) into a structured component and a pseudorandom component. In 
principle one could use higher order Fourier analysis (or the precise characteristic 
factors achieved in m to obtain this decomposition, but this looks rather 
difficult technically, though progress has been made in the fc = 4 case. Fortunately, 
there is a “softer” approach in which one defines structure purely by duality; to 
oversimplify substantially, one defines a function to be structured if it is approx¬ 
imately orthogonal to all pseudorandom functions. One can then obtain a soft 
structural theorem in which the structural component is essentially a conditional 
expectation of the original function to a certain cr-algebra generated by certain spe¬ 
cial structured functions which are called “dual functions” in m This tr-algebra 
(the finitary analogue of a characteristic factor) is not too tractable to work with, 
but somewhat miraculously, one can utilize the pseudorandomness properties of u 
and a large number of applications of the Cauchy-Schwarz inequality to show that 
the conditional expectation of u with respect to this cr-algebra remains bounded 
(outside of a small exceptional set, which turns out to have a negligible impact). 
Since h-w,b^A is pointwise dominated by a constant multiple of u, the structured 
component of Avu,61a is similarly bounded and can thus be controlled using Sze- 
meredi’s theorem. Combining this with the generalized von Neumann theorem to 
handle the pseudorandom component, one obtains Theorem 11.21 The result for the 
Gaussian prime constellations is similar, but uses the Gowers cube norms in¬ 

stead of the uniformity norms, and replaces Szemeredi’s theorem by a hypergraph 
removal lemma similar to Lemma o and Lemma see |S5]. 

The arguments used to prove Theorem ll.2l give a lower bound for the expression 
(HU, but do not compute its asymptotic value (which should be 1). As mentioned 
earlier, for fc = 3 this can be achieved by the circle method. More recently, the 
fc = 4 case has been carried out in Em, ED; the same method in fact allows one 
to asymptotically count the number of solutions to any two linear homogeneous 
equations in four prime unknowns. The key point is to show that h-w,b — 1 is 
quadratically pseudorandom, as the generalized von Neumann theorem will then 
allow one to control HI 411 satisfactorily. It turns out that a variant of Lemma 13.bl 
applies here, and reduces matters to showing that h-w,b — I does not correlate 
significantly with any 2-step nilsequences. This task is attackable by Vinogradov’s 
method, although it is rather lengthy and it turns out to be simpler to first replace 
Aiv,b — 1 with the closely related Mobius function. 
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